Skip to main content

Using outer join to capture people who start to work during the observation period

The script below shows how to work with job data from the A scheme. This was reviewed in our theme course, which was run twice in 2022.

This concrete example demonstrates how it should be done if you want to carry out a full count of employees over a longer period of time, possibly more than one point in time. By using the import option outer_join starting with variable no. 2, you connect all observations for the next variables, also those for which there are no observations for variable no. 1. Thus, all new job observations are captured (those who starts along the way and did not exist at the time of variable no. 1). The total sample is expanded in line with the number of new observations added for each import step with the outer_join option. The units for which no information was available at the previous measurement time are given a missing value.

If you drop the outer_join option during import, the standard solution "left join" will be used. That is, the population is defined by variable no. 1, and you only get to connect data from the next variables with the units that already exist given by import no. 1. Thus, you miss out on all new units that did not exist at first the measurement time.

 require no.ssb.fdb:30 as db

create-dataset all_jobs
import db/ARBLONN_ARB_ARBEIDSTID 2021-01-16 as worktime2101
import db/ARBLONN_ARB_ARBEIDSTID 2021-02-16 as worktime2102, outer_join
import db/ARBLONN_ARB_ARBEIDSTID 2021-03-16 as worktime2103, outer_join

summarize